-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Grammars Compiler #3915
New Grammars Compiler #3915
Conversation
This is now always checked by the grammars compiler
Here are all the known issues in existing grammars that the compiler is detecting:
As you can see, all the grammar now parse successfully. The main remaining issues are regarding keys which we don't know how to handle, and missing dependencies. For the unknown keys, there's already a whitelist in the compiler with keys which we don't handle but are "acceptable" (namely, The dependencies are a trickier thing. Lots of grammars in the existing library attempt to include grammars that don't exist. The biggest offender was |
@@ -1,10 +1,9 @@ | |||
--- | |||
type: grammar | |||
name: ruby.tmbundle | |||
name: javadoc.tmbundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh!? Why's this happening? (I've not looked too deeply or played with this, just noticed it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is two issues in one:
-
There was a
ruby.tmbundle
license in the Licenses folder which was just outdated (since we're now using the Atom Ruby package, which is not calledruby.tmbundle
). When I ran the license checker, it removed it. -
I added the new Javadoc grammar, which happens to have a license identical to
ruby.tmbundle
(and to most other Textmate grammars, they all share the same license). -
Git saw a removed license, and a new added one, and marked it as a rename (h e u r i s t i c s) instead of an outdated license removed and a new one added.
@@ -84,13 +84,13 @@ if repo_old | |||
log "Deregistering: #{repo_old}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move scripts to the new tools/
directory or is there a clear distinction between the two (besides the fact that one contains scripting language programs)? If the idea is to always use tools through scripts maybe tools/
should be a subdirectory (with a new name?) in script/
...?
Thanks for this @vmg! For the remaining issues in grammars, do you want to fix them all before merging this? Do you need help to report them upstream and try to fix them? I'm not very familiar with Go, but is it necessary to commit the dependencies ( |
Excellent work! 😀 👍 Does the compiler handle any PCRE-incompatible regex extensions, such as Oniguruma's There are other incompatibilities between regular expression engines, but I'd say the matter of |
They are non-blocking issues, since the compiler can work around them (e.g. by removing the unknown dependencies), so I'll merge this before they are fully fixed.
I would love that. I don't have the time to fix them myself like I did for the parsing errors because I'm working on the backend this week.
It is not necessary, but it is customary. Since we're distributing a pre-built Docker image, I think I can just not commit all these code. I'll remove it!
It does not, and this is a fantastic suggestion I hadn't thought of. I'll try to work on this today. |
I've not played with this yet, but one thing that did cross my mind is this adds a new little barrier to entry when adding new grammars as it now requires Docker be installed. We definitely need to add this requirement to the CONTRIBUTING.md file |
It does remove the requirement of having Node.js and NPM installed, so overall I'm not concerned about bringing in a Docker dependency. |
I better explain why this one is happening, since it's coming from one of my grammars:
On a related note, see the recent discussion over at Atom concerning their new grammar-parsing system that's nearing completion soon... you might need to update the list of unrecognised properties at some point. |
Does that mean we can now update to the most recent version of https://github.com/gandm/language-babel and finally deal with #3044? Will this also help with russian characters as detailed in #3291? |
Looking at the output of the compiler, it seems like the JSX grammar we're currently running (i.e. the pinned fork) is also broken. I tried upgrading to upstream and the grammar has the same (trivially fixable) issues: Lots of grammar rules use Oniguruma-specific syntax It seems like a trivial enough fix if somebody would like to attempt to upstream it. It would definitely be backwards compatible, and I think it'd allow us to switch back to master.
This is an unrelated issue, I'm afraid. PCRE supports the syntax just fine, it just doesn't interpret characters in the extended Unicode range like Russian. |
In brighter news, I've just tried enabling support for Unicode properties and I don't see a significant performance regression -- less than 5%. We may be able to leave them turned on as we roll out the new service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WFM locally without any errors. ✨
Following up on #3924 with all the issues that the compiler detected. |
@lildude: As far as I can tell, this is the case. When calling |
@vmg I've not tested with adding a new grammar, but I did test with replacing a grammar (our longest serving desirable):
After doing the above, I ran
On a side note, it would be nice to be able to query just a single grammar too. I also noticed that the script doesn't exit if Docker isn't available/running:
Want new issues for all of these points? |
I see. The compiler is outputting error messages, but I'm afraid that the |
Alright! Here's the new grammars compiler I've been working on. The old Ruby
script/convert-grammars
has been replaced by an actual compiler written in Go. The source code can be found intools/grammars/
.To note:
The compiler is being distributed as a Docker image,
linguist/grammars-compiler:latest
on Docker.io. The image is self-contained (duh) and includes all the dependencies, most notably NPM and the CSON parser it needs to run.package.json
has been removed from this repository because eek.The new
script/grammars-compiler
is a thin wrapper over running the Docker image with the arguments you pass in.The main goal of the compiler is ensuring that all the grammars we add to Linguist are sensible and not malformed. This was not the case previously. I've had to manually send PRs to many upstream repositories with different fixes for issues that the compiler has detected. Their corresponding submodules have been upgraded in d08375d
The compiler also normalizes the grammars into a fixed schema, removing superfluous keys and unifying content structures. This schema is serializable to JSON (like the old compiler) or to a Protobuf library, which we're going to start using internally.
As it stands, this build is now green, although the compilation step is not fully "clean" yet. I'm going to open another issue with a list of outstanding issues in the existing grammars.
cc @github/linguist